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Based on the concept of many-letter theory, an observable is defined measuring the raw quantum 
information content of single messages. A general characterization of quantum codes using the 
Kraus representation is given. Compression codes are defined by their property of decreasing the 
expected raw information content of a given message ensemble. Lossless quantum codes, in contrast 
to lossy codes, provide the retrieval of the original data with perfect fidelity. A general lossless 
coding scheme is given that translates between two quantum alphabets. It is shown that this 
scheme is never compressive. Furthermore, a lossless quantum coding scheme, analog to the classical 
Huffman scheme but different from the Braunstein scheme, is implemented, which provides optimal 
compression. Motivated by the concept of lossless quantum compression, an observable is defined 
that measures the core quantum information content of a particular message with respect to a given 
a priori message ensemble. The average of this observable yields the von Neumann entropy. 



I. INTRODUCTION 

In [n|, the concept of a quantum information the- 
ory generalized to messages with components of variable 
length has been presented, here referred to as many- 
letter theory. Based on this concept, an observable is 
defined measuring the raw quantum information content 
of single messages. A general characterization of coding 
schemes using the Kraus representation is given. Com- 
pression then means decreasing the expected raw infor- 
mation content of a given message ensemble. Apart from 
lossy schemes like the Schumacher coding scheme, which 
compresses quantum messages by neglecting "unimpor- 
tant" information, a lossless coding scheme can be im- 
plemented, which not only ensures perfect fidelity in re- 
trieving the original messages but also provides optimal 
compression. This coding scheme differs from the Braun- 
stein scheme presented in Q, mostly in that it is a per- 
fectly lossless code and, since it exploits the features of 
many-letter space, it cannot be implemented in a stan- 
dard block Hilbert space. Motivated by the concept of 
lossless compression, a quantum mechanical observable 
is defined that measures the amount of core quantum in- 
formation contained in a particular message with respect 
to a given a priori message ensemble. 

This paper is separated into two parts. The first part 
reviews roughly the basic concepts of classical coding in 
order to motivate the corresponding notions presented in 
the second part, which is dedicated to quantum coding. 
A detailed summary of classical information theory can 
be found in |g], a very recommendable review on quan- 
tum information theory is given in P|. 



II. CLASSICAL INFORMATION THEORY 



A. Notions and definitions 

The following notions and definitions are used through- 
out this paper. The reader is referred to |lj for further 
details. 

A classical message is a string x of letters x taken 
from an alphabet A of size |^| and is denoted by cc = 
(xi • • • x„). Strings of length n are explicitely denoted by 



{Xi ---Xn) 



(1) 



The set of block messages x of fixed length N is written 
as 



A'^ := {{xi---xn) \xn eA} 



(2) 



Let us also allow for the empty message x'^ = (•) that 
forms the set A'^ := {(•)}• The set of all messages of 
finite length is defined by 



A+ := Q ^" 



(3) 



A general message ensemble is represented by a random 
variable 



X:={[a;",p(x")] | x" £ f!} 



(4) 



of strings a;" drawn with a priori probabilities p{x"') > 
from a source set V,, such that X^x^goP^^") = 1- A 
canonical message x^ is drawn from the ensemble 



with factorizing a priori probability p{x^ ) 
p{xi)- ■■p{xn)- 



(5) 



B. Raw Information content 

There are several ways to think about an "amount of 
information" carried by messages. I will take a rather 
pragmatic point of view by simply asking "How much 
effort does it take to communicate the message?". Say 
Alice builds a device to send messages to Bob, who in 
turn builds the adequate receiver. To every single letter 
Alice has to use a sender unit that creates any of the 
alphabet letters. The more letters in the alphabet, the 
more complicated the device, hence the more effort to 
communicate the messages. If the length of the message 
is TV, then N of the sender units are in use. Bob builds 
enough receiver units to receive a message of arbitrary 
length. He adds a meter on top of his receiver indicating 
the number of active receiver units and calls the indicated 
value the raw information content of the received mes- 
sage. He calibrates the pointer to show exactly 1 unit of 
information whenever the message contains the smallest 
amount of information, given by a single-letter message 
over the binary alphabet. He calls the unit of this infor- 
mation "1 bit". To put it mathematically, we define the 
raw information content as a function / : A'^ —>■ [0, cxd) 
with 



/(cc) ■.= log\A\L{x) 



(6) 



where L{x) is the length function on A'^. The miit of this 
information measure is "1 bit" . For example, the iforma- 
tion content of a message of length n over the binary al- 
phabet A = {0, 1} is /(a;") = log2L(a;") = n bits. Note 
that this measure applies to single messages rather that 
message ensembles. No statistical information is needed, 
it is just an observable that can be realized by some mea- 
suring apparatus. The value of I is indicated by a disk 
manager as the file size on a hard disk or by an internet 
browser as the received information during a download 
process. The bigger this number, the longer it takes to 
open, edit, save or download a particular message. In 
this sense it is a true physical observable. Taking into 
account the statistical properties of a given message en- 
semble -X" one may define the ensemble raw information 
content by 



I{X) = J2pix)^og\A\Lix) 



(7) 
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The bigger this number, the bigger the effort of commu- 
nication on the average. Although a lot of other infor- 
mation measures are possible, the measure (^ and its 
average (R) will suffice for our purposes. 



decoding scheme is hard to guess, it ensures the security 
of their conversation and falls in the domain of cryptog- 
raphy. If the code takes advantage from the statistical 
properties of the used message ensemble, it allows for 
compression in order to minimize the effort of transmis- 
sion or storage of the data. Mathematically, a code is 
just a mapping c from a given source set il of messages 
composed from a source alphabet ^ to a code set ilc of 
code messages composed from a code alphabet Ac- A 
code can be specified into two types: 

• A lossless code is uniquely decodeable, i.e. Va;,t/ S 
fl,x ^ y : c{x) y^ c{y). For any finite set M C fi 
we have |A/| = |c(A/)|. 

• A lossy code maps certain messages to the same 
encoding, i.e. 3a;, y e Vt : c{x) — c{y). For ev- 
ery finite set M C fi we have \M\ > \c{M)\. Each 
time a message is being irreversibly encoded, the 
decoder cannot recover the original message and 
will give an error. If the probability of error can be 
made very small, the lossy code may be useful. 

Furthermore, there are two other important types of 
code: 

• A block code encodes only block messages of fixed 
length N over a source alphabet A to block code 
messages of fixed size M over a code alphabet Ac- 



It is a function c: Q. d A 



N 



^cc A 



M 

c ■ 



• A symbol code encodes messages of any length by 
encoding each letter separately. If c : ^ ^ Ac is 
a code on the source alphabet A, then it can be 
extended to the code c : A'^ — > At. by 



c{xi ■■■Xn) ■-= c{xi) ■ ••c(a;„) 



(8) 



A code c thus maps a source message ensemble X to 
a code message ensemble Y = c{X), which can be ex- 
pressed in terms of the source message ensemble as 

Y = {[c{x),p{x)]\xen} . (9) 
The transformation to the new ensemble 

Y = {[y,pc{y)]\yenc} , (lo) 

is given by y := c{x), flc '-— c{il) and pciu) '■= 
J2xenP(^)^ic{x),y), where 



III. CLASSICAL CODING 

A. General types of codes 

Alice and Bob decide to use a code while exchanging 
their messages. If encoding and decoding is easy but the 



6{x,y) := 



1 : x^y 

-..x^y 



(11) 



is the string version of the Kronecker delta. Note that if 
X is a canonical message ensemble X^ , the code message 
ensemble Y = c{X) is generally not. 



B. Binary symbol codes 

A binary symbol code is a symbol code c : fi C A^ -^ 
r^c C {0, 1}+. There is a connection between reversibil- 
ity of binary symbol codes and the lengths of the encoded 
letters. It is given by the Kraft inequality that states that 
the codeword lengths of a lossless binary code must sat- 
isfy 
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L,(x) 



< 1 



(12) 



where Lc{x) is the length of the codeword corresponding 
to X. 



C. Prefix codes 

How can Bob decode a symbol code he received from 
Alice? By the original definition (||) the code is obtained 
by encoding each source letter separately and then con- 
cetenating the codewords to an entire string. If a code 
is lossless it is nevertheless possible to decode the mes- 
sage, since by construction there is a distinct code for 
any of the source messages. Among the lossless symbol 
codes there is an important class of code called prefix 
codes. They are defined by the property that no code- 
word is a prefix of another codeword. Thus a prefix code 
is instantaneous, i.e. it can be decoded simply from left 
to right without looking at the entire string. An exam- 
ple for prefix codes are telephone numbers. The decoder 
does not have to wait until the entire phone number is 
entered, it can proceed connecting while the numbers are 
sequencially transferred. As soon as it arrives at a single 
telephone device, the connection is established. Luckily, 
one can prove that whenever the codelengths of a sym- 
bol code satisfy the Kraft inequality (O), there is a prefix 
code with the same codeword lengthsfsee ||] pp 95). In 
other words, whenever a given symbol code is lossless, it 
can be replaced by a prefix code with the same codeword 
lengths. Thus in the following we will always assume for 
a lossless symbol code to be a prefix code, so it can be 
instantaneously encoded and decoded. 



D. Compression codes 

A code maps a given message ensemble to a code mes- 
sage ensemble with the codewords obtaining new lengths. 
Instead of using the length function Lc '■ A^ ^ N on the 
code messages one can use a code length function on the 
source messages, defined by 



X e A'^ 



Lc{x) := Lc{c{x)) 



(13) 



giving each source message x G A^ the length Lc{x) of 
its code. Consequently, each encoded message obtains an 



encoded information content, 

Ic{x) := L^{x)\og\Ac\ , 



(14) 



and the encoded message ensemble obtains an encoded 
ensemble information content, 



Je(X):=^p(a;)/c(a;) 



(15) 
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A compression code is a code c fulfilling 

Tc(X)<I(X) . (16) 

E. Block compression 

Block compression (see e.g. 0, g, g) applies to 
canonical messages X^ of fixed block length N, generally 
given by (|5|). It is a lossy coding scheme in that it only 
encodes "typical strings" . The clue of block compression 
is that the probability of error can be made arbitrarily 
small by increasing the block size N. Define the typical 
set Tg' with tolerance 5 as the set of all messages x^ 
drawn from the canonical ensemble X^ , whose a priori 
probabilities fulfill 



2-W(ff+^) < p(:^W) < 2-~(^- 



-&) 



(17) 



with the Shannon entropy H = H{X) of the letter en- 
semble X being defined as 



H{X) 



]p[x)\ogp{x) 



(18) 



The probability of a message being in the typical set is 
given by 



PT:=P(x^^eTf)= 2. P{x'') , (19) 

x«GTf 

and the number of typical messages obeys 

(l-e)2^(^-*')< irfl <2^(«+*) . (20) 

Shannon's noiseless coding theorem states that for all 
e, (5 > there is an TVq G ]N, such that for any N > Nq 
we have 



Pt > 1 - e 



(21) 



So if we only encode the typical messages and forget the 
rest, the probability of success, which is given by (|2l]), 
will still be satisfying for N being large enough. In 
that case, the typical set contains approximately jTj^l w 
2NH(x) niessages to be encoded. The block compres- 
sion code maps every typical message to a binary string. 
Since there are 2^^^'^^ distinct messages to be encoded, 
one needs for every message x a binary string of length 
Lc{x^) « NH{X). Untypical messages can be all 
mapped to the same arbitrary "junk string" . In real life 



the length of the codewords is given by the integer next 
above NH{X). Nevertheless, let us view Lc{x^) as an 
"ideal length" and accept it to be not an integer. Then 
the ensemble information of the encoded messages (|l5| ) 
equals the information content of each encoded message: 

7c{X'')=Y,p{x'')Lc{x'')\og2^NH{X) . (22) 

The ensemble information (M) of the source messages 
reads 

7(X^) = ^p(a;^)L(x^)log|^|-A^logl^| . (23) 



The Huffman code is completely defined by a single-letter 
code c: A ^ Ac, mapping each letter cc to a binary code- 
word c{x) G Ac C {0, 1}+ of variable length Lc{x). The 
extended code on strings a;" of arbitrary length n is given 

by 



c(a;") := c{xi) ■ ■ ■c(x„) 



(25) 



Because the raw information content of each encoded let- 
ter X equals the average length, i.e. Ic{x) = Lc{x), the 
letter ensemble information content is given by 



/,(X)=^p(a;)L,(a;) 



(26) 



Since H{X) < log |^| we have 

7c{x^) < 7(x^) 



(24) 



thus condition (^6|) is satisfied, the block compression 
code is indeed compressive. 

In other words. Shannon's noiseless coding theorem 
states that for canonical messages X^ the information 
per letter can be compressed to H[X) bits approximately. 
It is not possible to compress the messages to fewer than 
NH{X) bits without increasing the probability of er- 
ror exponentially with N . This gives reason to think 
of the Shannon entropy as some kind of "core informa- 
tion" where all redundancy due to statistical predictabil- 
ity has been removed. But note that block compression 
only applies to canonical messages. In case of general 
messages it makes no sense to speak of information per 
letter, since Alice can chose entire strings of arbitrary 
length with some a priori probability, so the notion of 
a letter ensemble X becomes meaningless. Furthermore, 
the block compression code is a block code which assigns 
binary integers to entire blocks of strings, which requires 
a high computational effort. Keep also in mind that it is 
a lossy code where information may be irreversibly lost, 
so it is really not a good idea to compress a hard disk by 
block compression. 



F. Variable length compression (Huffman coding) 

Shannon demonstrated in H] that a message ensemble 
may be losslessly compressed by adapting the codeword 
lengths to the probability of the messages. The aver- 
age length per symbol of the encoded messages will, in 
the optimal case, approach the Shannon entropy. In- 
stead of encoding entire messages at once, one can de- 
sign a symbol code that encodes each letter separately 
such that optimal compression is achieved by variable 
codeword lengths. Of such kind is the Huffman code, 
which is a binary symbol coding scheme that applies to 
messages of arbitrary length and is optimal on canoni- 
cal messages. Furthermore, it is a lossless prefix code, 
so any source message can be retrieved from its encod- 
ing instantaneously and without any loss of information. 



It can be shown by using the Kraft inequality (see [pi , pp 
93) that the average length of any lossless binary symbol 
code fulfills 



L,{X) > H{X) 



with equality if and only if 



Lc{x) = -logp{x) 



(27) 



(28) 



Of course, in real life Lc must be the integer next above 
(— \ogp{x)). This is what the Huffman code does. It con- 
structs to every source letter x a binary prefix codeword 
with a length between (— logp(x)) and {—\ogp{x) + 1). 
This way, it is an optimal symbol code on the alphabet 
A, minimizing the ensemble informtation of each letter. 
Again, let us view {—\ogp{x)) as an "ideal length" that 
can be interpreted as the core information of a the letter 
X, where all redundancy due to statistical predicatbility 
has been removed. It is given the name Shannon infor- 
mation content, denoted by 

h{x) := ~logp{x) . (29) 

The ensemble average of h{x) yields the Shannon entropy 

H{X)=<h{X)>=-J2pix)^ogpix) . (30) 



Now consider messages x'^ drawn from the canonic en- 
semble X'^ . The information content of each encoded 
message is 



N 



h{x^) - L,{X^) = J2 Lc{Xn) 



The ensemble information thus reads 

Tc(X^) - A^Ic(X) . 



(31) 



(32) 



An ideal Huffman code providing Lc{x) = h{x) would 
give the ensemble information 



T fvN 



Ic{X")^NH{X) 



(33) 



Since H{X) < log |^| the Huffman code is a compression 
code satisfying condition (|l^). For any lossless code we 
have Lc{X) > H{X), so it is an optimal lossless code on 
canonical messages of any length. How about disdvan- 
tages? There is some probability that a particular mes- 
sage is lengthended instead of being compressed. This is 
the price for having a lossless code. While a lossy code 
compresses the most probable files but forgets the rest, 
a lossless code compresses the most probable files and 
enlarges the rest. In both cases holds: The bigger the 
message, the less likely the bad case. 



G. General variable length compression 

The principle of variable length coding can be used to 
compress general messages X. Given a set fJ C A'^ of 
messages x of fixed or variable length over the alphabet 
A, distributed by p{x). Now take the message set ft it- 
self for an alphabet, i.e. construct a Huffman code that 
maps any message a; e fi to a binary codeword of length 

L,{x) ^ -logpix) . (34) 

Then again the ensemble information is minimized to 



hiX) = -J2 Pi^) ^ogpix) = H{X) 



(35) 



xeo 



Of course, if the message set J7 is infinite, it would take 
forever to construct the corresponding Huffman code. 
But if il is small enough or if the probability distribution 
is sharply peaked around a small subset of fi it might 
even be more effective to construct a Huffman code on 
the message set Q, than on the alphabet A. However, we 
lose the advantage of sequentially coding, i.e. coding let- 
ter by letter, since the code assigns a codeword to entire 
messages rather than to each letter. In case of canonical 
messages X = X'^ we have 



N\ 



h{X''') = H{X")=NH{X) 



(36) 



hence by the "Huffman" message code the same compres- 
sion is achieved as by the Huffman symbol code. 



H. Core information content 

Just like the raw information content of a given mes- 
sage measures the real effort of communicating it, the 
Shannon information content measures the ideal effort, 
after encoding it by an optimal compression code that ex- 
ploits the statistical properies of the whole ensemble. We 
may therefore define an observable core information con- 
tent /o : rj C -4+ -^ [0, cx)), applying to general messages 
X of an ensemble X, giving each message its Shannon 
information content: 



The average of Iq , the ensemble core information content, 
is equal to the Shannon entropy: 



Io{X) := HiX) = -J2 P(x) logP(a=) 



(38) 



xen 



Why these new names, since there is nothing new de- 
fined? The motivation is to stress out the meaning con- 
tained in the notions of Shannon information content and 
Shannon entropy, in order to make a generalization to 
quantum information possible. It will then appear more 
reasonable to speak of an observable "core information 
content" . 

In order to illustrate the difference between the raw in- 
formation content and the core information content imag- 
ine two books. Surely, one has to pay twice the price to 
buy them, since the printer has twice the work by print- 
ing them. If this book can be downloaded from the in- 
ternet and you would download it twice, it would take 
twice the time and occupy twice the space on your hard 
disk. So there is a double raw information content of 
these two books. Though at the very moment you notice 
your download mistake, you surely would delete one of 
the copies from your hard disk, since it does not con- 
tain twice the core information. The two books can be 
compressed to one book without any loss of information. 
This is possible by reversibly mapping the message set 
Q, of the single book to the set ^2 '■— {{x^x) | a; G J7} 
of pairs of messages representing two copies of the book, 
and vice versa. As this is a lossless code and the prob- 
ability distributions on fi and O2 are identical, the core 
information of one book equals the core information of 
two copies of the same book. 



IV. QUANTUM INFORMATION THEORY 

A. Notions and definitions 

For further details on the following notions and defini- 
tions the reader is referred to ||l[ . A quantum alphabet Q 
is a set of Hilbert vectors normalized to unity. 



{|x)}cH 



(39) 



The letters of Q span the letter space Hq := Span(Q). 
Since the letter states do neither have to be mutually 
orthogonal nor linearly independent, the dimension of 
the letter space reads in general Kq := dimHQ < |Q|, 
with equality if the letter states are linearly indepen- 
dent. There is a set of mutually orthogonal basis letters 
Bq — {\a)}a with dim Tig = \Bq\. A quantum string is 
a product vector |x") = |a;i) (g) . . . (g) |x„} of letter states 
\x). All possible quantum strings over the alphabet Q 
form the set 



Io{x) := h{x) = ~\ogp{x) 



(37) 



Q" := {|x")} C H" 



(40) 



The elements of Q" span the block space 

H'4 := Span(Q") . (41) 

where TC% :— Span(Q'') is the one-dimensional space 
spanned by the empty message |-) that forms the set 
2° •= {!')}■ ^ many-letter message is a vector \ip) in 
the many-letter space 



Mq-'-^-Hq . 



(42) 



n=0 



and can generally be represented as a superposition of 
block strings |a") over the basis alphabet Bq: 



i^) = EE^(«")i°") ' 



(43) 



with the wave components (^(a") := (a"|(p) having dis- 
tinct length n. An a priori message ensemble is repre- 
sented by a random variable |$), whose realizations are 
quantum messages \ip) chosen from a source message set 
r with a priori probabilities p(</?). The corresponding 
message matrix reads 



v3Gr 



(44) 



A canonical message is a product message |a;") cho- 
sen from an ensemble |X") with probability p(a;") = 
p{xi) ■ ■ -pixn)- For canonical messages there is a letter 
matrix 



P 



= ^P{x)\x){xl 



(45) 



such that the message matrix separates into the n-fold 
tensor product of the letter matrix, i.e. a = p®". A 
grand canonical message is represented by the message 
matrix 



n=0 



An P , 



(46) 



with A„ > 0, J2n ^n = 1- The length of a message can 
be observed by the self-adjoint length operator L acting 
on the many-letter space M.Q, represented by a spectral 
decomposition of mutually orthogonal projectors n„ on 
Mq, such that 



L = ^ n n„ , 

n=0 



with 



-L-^n ^^m — Onni^^ri} / ^ -L-^n — ^ 



(47) 



(48) 



The eigenspaces of the length operator are the block mes- 
sage spaces Tig , which are subspaces of the many- letter 

space M.Q. Hence the eigenvalues n of L are degenerate 
by K'q := dimHg = (dimHg)". Using the basis letter 
set Bq — {\a)}a one obtains the spectral decomposition 



i = EEi«")(«"i 



(49) 



n—O a" 



of the unity operator on A4q. The sum above is always 
understood as the sum over all distinct strings of length 
n over the basis alphabet Bq = {\a)}. 



B. Raw quantum information content 

It is tempting (and we will give in to this temptation) 
to define an observable that measures the quantum in- 
formation contained in a single message \ip). Bob builds 
a meter on top of his receiver that he can switch on and 
which then indicates the number of active quantum sub- 
systems while receiving a message from Alice. Say, Alice 
sends him a single-letter message |a;) from the quantum 
alphabet Q. In order to receive this message, the re- 
ceiver has to be sensible enough to recognize each wave 
component of the message, whose number is dim Tig. If 
Alice sends him a block message of length n, there are 
n receiver units in action. Bob calibrates his meter to 
show 1 unit of quantum information if Alice sends him a 
message composed from a two -state system. In analogy 
to the reasoning of section II B, the quantum information 



content of a block message \ip) of length L{ip) composed 
from the alphabet Q reads 



I{ip) ^ logidimT-iQ) L{(p) 



(50) 



Consequently, the observable that measures the raw 
quantum information content of an arbitrary quantum 
message \ip) G A^g can be defined as 



/ := log(dim'Hg) L 



(51) 



Using the orthogonal letter basis Bq = {|a)}, the length 
operator may be written as 



/ = log(dimHg)^^n|a")(a'^ 



(52) 



n— a" 



It is typically quantum that a given message has gener- 
ally no well-defined information content. Rather, there is 
an expected raw quantum information content, given by 



/(^) = (^|/|^) 



(53) 



Like every measurement, the detection of its quantum in- 
formation content potentially disturbs the message. The 
number "quantum information content" is itself a clas- 
sical information that destroys quantum correlations be- 
tween components of distinct information content. 



The (expected) raw quantum information content of an 
arbitrary message matrix a is calculated by 



I{a) := Tr{a/} 



(54) 



The unit of quantum information is "1 qbit". So, within 
the presented framework, the name "qbit" obtains two 
meanings: 1) A two-level quantum system and 2) The 
unit of quantum information, measured by the observable 
/. This goes in close analogy to classical information the- 
ory, where the name "bit" also means both a two-state 
system (e.g. a dot on a compact disk) and the unit of 
classical information. 



with Kraus operators Ei. The CPM needed here maps 
states in S{Mq) to states in a different state space 
S{M.c)- So the Kraus operators governing the encod- 
ing process are linear operators Ei : M.q ^t AAc, which 
are named encoders, fulfilling the Kraus property 



Y,e\e, = i 



(59) 



A code having the additional property 

Y^E.eI^Imc (60) 



V. QUANTUM CODING 

A. Encoding 

A classical code is a function that maps one message 
ensemble to another. Thus a quantum code simply maps 
one quantum message ensemble to another. Let the 
source ensemble be an ensemble of general many-letter 
messages, defined by the random variable 



|<i>):={[|^),p(^)]||^er} , 

with the source set 

V:^{\^)&Mq\p{^)>Q} 



(55) 



(56) 



of many-letter messages composed from the alphabet 
Q = {|x)}. The source message ensemble corresponds 
to the source matrix 
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(57) 



A code maps the source ensemble to a code ensemble |<I>c) 
of code messages \fc) over a code alphabet Qc, taken 
from a code set Tq with a priori probabilities pcifc)- 
The code alphabet spans another code letter space TCc 
which in turn induces a many-letter code space A4c con- 
taining all messages that can be composed from Qc- 
The code ensemble is also represented by the code matrix 
ac G S{A4c) which is required to be a density matrix. 
The code c can be represented by a superoperator c acting 
on the state space S{A4q) of density matrices over the 
many- letter space A4q and mapping them into the state 
space S{A4c) of encoded density matrices over the many- 
letter space Mc- Thus we have c : S{Mq) -^ S{Mc) 
with ac ~ c{a). 

The most general thing that Alice can do do with the 
quantum state ct is a completely positive map (CPM), i.e. 
a completely positive function mapping density matrices 
to density matrices. Every CPM has a Kraus represen- 
tation 



ac^^E.aEl 



(58) 



is a unital code. 

Alice encodes each of her a priori messages \^p) into a 
generally mixed state 



a^:=^ii;,|^)(^|ii;, 



t 

i 1 



(61) 



(62) 



so Bob will receive the encoded message ensemble 

ver i 



B. Decoding 



Bob wants to decode the encoded message he obtained 
from Alice. He applies a CPM, given by some Kraus 
operators Dj : AAc -^ ■Mq, called decoders, and finally 
obtains the decoded matrix 



ij tpeT 



(63) 



where a^p is the mixed state that Bob obtains by decod- 
ing the encoded a priori state \(p). In general the de- 
coded matrix a' is not identical to the source matrix a. 
What can be said about the confidence of the transmis- 
sion? Say, Alice sends a message \ip). After encoding and 
decoding, the message will be crumbled into the mixed 
state 



a^ = ^i?,ii;,|^)(^|ii;|i? 



tnt 
j 



(64) 



Though still there is a certain probability for Bob that he 
can recover the original message by a generalized mea- 
surement. The probability of finding the state \ip) in the 
ensemble a' is given by the fidelity 



F{lp) = {^\(Tp\Lp) =Y\{lp\DjE^\lp)\ 



(65) 



The confidence of the code is then defined by the average 
fidelity, 



F:=J2p{^)J2MD,E,M' 



iper 



(66) 



such that the average of Lc for the source ensemble a, 
Lc{a) = Tv{a Lc}, equals the ensemble length Lc{(Jc) of 
the encoded ensemble ac- That way, one can define the 
observable encoded quantum information, acting on the 
source space Mq, by 



whereas the probability of error is given by 



(67) 



Bob now has the task to construct decoders Dj opti- 
mizing the confidence of the code, i.e. decreasing the 
probability of error. However, the confidence cannot be 
expressed in terms of density matrices. It is an expression 
that requires Alice's a priori knowledge of the message 
ensemble, i.e. the random variable |<i>). So Bob has to 
do the job together with Alice, constructing suitable de- 
coders that maximize the confidence of the transmission. 



Ic := log(dim7ic)ic 



(69) 



so that the expected encoded quantum information con- 
tent of a message matrix a reads 



/,(a)=Tr{a/J . 



(70) 



The observable Ic indicates how long a source message 
would be if it was encoded by c. A compression code 
is thus a code c : S{Mq) -^ S{Mc) that reduces the 
quantum information of the message ensemble, i.e. 



Ida) < I{a) 



(71) 



C. Lossy and lossless codes 

A lossless code is represented by an invertible super- 
operator c with only one Kraus operator E. According 
to (p3), E must be an isometric operator, i.e. E^^ E = 1. 
A unitary code fulfills in addition EE^ — Imc- Using a 
lossless code, Alice encodes her source matrix through 
ac = EaE\ and Bob decodes it uniquely through 
a — E'^ ac E. For a lossless code the confidence of trans- 
mission, given by (pq), is i^ = 1. 

A lossy code has a Kraus representation with more 
than one Kraus operator. It is not possible to uniquely 
recover the source matrix a. Instead, the decoding pro- 
cess. using decoders Dj, gives a decoded matrix a' given 
by ( | 63|) . For a lossy code the confidence of transmission 
is F < 1. If the confidence can be made close to unity, 
the lossy code may be useful. 



D. Compression codes 

A quantum compression code reduces the information 
content of the message ensemble a, given by (|54|). The 
quantum information content of an encoded state is rep- 
resented by the observable Ic = log(dimHc) ^c, where 
Lc is the length operator in the code space M.c- Though 
it is more convenient to express everything in the source 
space M.Q. 

The average length of an encoded state ac G S{Mc) 
is given by Lc(ac) — Trjcrc-^c}, where the encoded 
state is obtained from the original state a G S{Mq) by 
ac — Ylii^i^ ^i- The length operator Lc on M.c can 
be mapped to an observable Lc on A^g by 



Lc 



A^ 



E\LcE, 



(68) 



E. Translation of messages 

Alice has just typed a message to Bob into her quan- 
tum computer and now wants to save it. But the quan- 
tum hard disk only operates with qbits, whereas the mes- 
sage is written in english. So the quantum computer has 
to invoke an algorithm to translate the message from the 
english alphabet to the qbit alphabet. Needless to say, 
lossless coding is desired here. To put it more general, 
let Q, Qc be two quantum alphabets with correspond- 
ing basis alphabets Bq — {\a)}a. Be = {lc)}c, spanning 
the letter spaces Hq, He and inducing the many- letter 
spaces A4q, M.c, respectively. A translation code be- 
tween the alphabets Q and Qc is completely specified 
by an isometric block translator t : TLq -^ Ti^ mapping 
each block of N source basis letters to a block of A/ code 
basis letters, i.e. 



V|a^) e B^ 



tW ) 



Mt„N 



{a''))eB, 



M 



(72) 



where |c (a )) = |(ci • ■ ■CM){a )) is a string of M basis 



letters over the code alphabet with (c (a )|c (a' )) = 



dg^M^iM. The value 



R 



M 

'n 



is called the rate of the code and has to fulfill 
log(dimHe) 



R> 



log(dimHc) 



(73) 



(74) 



in order to reversibly encode each source letter block. 

Since the basis letters are mutually orthogonal, the let- 
ter translator t reads 



f=^|c^>^))(a^ 



(75) 



The message translator is then defined by 

oo 

f:^Y^ £»n ^ (76) 

where 



n=0 



f®0 



•>(• 



t* 



t( 



it 



(77) 
(78) 



Because the block translator t is isometric, the mes- 
sage translator T : TWg ^ TWc is also isometric, i.e. 
T'^T — 1, and reads in general 



r = ^^|c"^^(a"^))(a"^| 



(79) 



where |a"^) = \a'^ ■ ■ -a^) denotes a string of n blocks 
of length N being mapped to a codeword |c"^^} — 
[c^-R • ■ • c^^') of n blocks of length NR. Every quantum 
message \ip) G TWq is translated to 

oo 

ri^) = EE^(«"'^)i^"'^>"'^)) ' (80) 

with the wave components ip{a"^) :— {a'^^\ip). The 
whole message ensemble a G S{Mq) is translated to 
ac — T aT'' . 

The observable measuring the encoded quantum infor- 
mation of a message being translated is according to (p9) 

Ic - log(dim Hc)ic = log(dim7^c)rt Lcf (81) 

OO 

= log(dimHc)EE"^^l""^)(""^l (82) 

^ log(dim7^cW ^ (33) 

log(dim7^Q) ^ ' 

Since the rate _R has to fulfill condition (|74|), we have 
^c > -^, i-e. translation codes are never compressive. 
This is reasonable since compression is only possible by 
taking advantage of statistical properties of the message 
ensemble. A translation code is not based on statistical 
properties, hence it cannot be compressive. In the best 
case, the rate fulfills ( [74[ ) with equality, so the encoded 
raw information just equals the source information. 

Case 1: dimHg < dim Tic. 
Alice's alphabet is not bigger than the alphabet of the 
quantum hard disk. So she can chose a block of size N 
of source letters that is mapped to a single code letter. 
The rate of the code is i? < 1. 

Case 2: dimHg > dim Tic- 
Alice's alphabet is bigger than the alphabet of the quan- 
tum hard disk. So it is necessary to find codewords of 
length _R > 1 for every source basis letter. 



F. Block compression: Schumacher coding 

1. Standard Schumacher coding 

The Schumacher code (see Q) i s the quantum analogue 
to block compression (see section niE| ). It is a lossy code 
on canonical messages of fixed length N. Thus through- 
out this section we stay in the block space Hq. 

Alice uses a canonical message ensemble given by 



xn^{[\x^),p{x^)]\\x^)eT} 



(84) 



with the source set F of all quantum strings \x^) of length 
N over the alphabet Q which spans the letter space Hq. 
The a priori probabilities read p{x^) — p{xi) ■ ■ ■p{xn)- 
The source message ensemble corresponds to the message 
matrix 



a = p®^ = p( 



where the letter matrix is given by 



'P 






p{x)\x){x\ 



(85) 



(86) 



The set of p-eigenvectors form a basis letter set Bq = 
{\a)}a, such that dim Tig = \Bq\ and 



J2<l{a)\a){a\ 



(87) 



Hence the source message matrix obtains a diagonal 
form in the basis strings |a") G Bq with g(a") = 
q{ai) ■ ■ ■ q{an)- The ensemble that Alice submits appears 
to Bob as a mixture of strings over an alphabet Bq of per- 
fectly distinguishable letters |a), each one distributed in- 
dependently by q{a). Shannon's noiseless coding theorem 
may be applied as follows. There is a typical set T^ of 
quantum strings \a^), whose probabilities fulfill (cf (p^)) 



2NiH+s) < ^(„iV) < 2-^(^+^) 
such that for every e,S > we have 

Pt ■■= P(|a^) e Tf ) > 1 - e 



(88) 



(89) 



Here H is the Shannon entropy of the basis letter ensem- 
ble. 



H := -^p{a)logp{a) 



(90) 



which equals the von Neumann entropy of the letter ma- 
trix p, 



S{p) := -Tr{plogp} 



(91) 



i.e. H = S{p). The von Neumann entropy is bounded 
from above by 



S{p)<\og{dimnQ) 



(92) 



As the typical set T/^ contains mutually orthogonal vec- 
tors, they span a typical subset 

Vi' :^ Span(Tf ) (93) 

with dim V/' = \Tl^\. According to Shannon (cf (|2^)) we 
therefore have 

(l-e)2^(^-*)<diml/,^<2^(^+^) . (94) 

Define the projector on the typical subspace by 



I a") erf 



then the total probability of messages lying in the typical 
subspace reads 

Pt= Y. ^(0"^) = Tr{p«^nT} , (96) 

|a")eTf 



SO together with ( |89| ) we have 

Tr{p®^nT} > 1 - e 



(97) 



Alice now encodes message components in the typical 
subspace by the encoder 



|a«)GTf 



(98) 



where \c^{a'^)) is a unique codeword of length R over 
an orthogonal code alphabet Be for the typical message 
\a^). Since there are dimT^j^ orthogonal messages to en- 
code, the rate R of the code, which gives the dimension 
of the code space Hq, obeys 



R> 



log(dimV/) 
log (dim He) 



(99) 



where Tic is the letter space spanned by the code alpha- 
bet Be = {|c}}- Alice maps the components outside the 
typical subspace to a junk string |cj^„fc) G Hq of length 
R orthogonal to the code image of the typical subspace 
by the encoder 



E^T ■— Y 



^junklV^ 



(100) 



a^iT^ 



which gives the second Kraus operator. Altogether, any 
a priori source message \x^) is encoded into the mixed 

state 

a'^^^ET\x''){x''\E}r + E^T\x''){x^\E\^ (101) 
= Y |(a^|x^)nc«(a^))(c«(a^)| 

|a">GTf 



Bob decodes the message by applying the decoders 

Dt-.^e!,, D^t-.^ Y Kunk){c% (103) 

\c'^)^Wt 

where Wt C Hq is the code image of the typical sub- 
space, i.e. 



Wt 



c{vn, 



(104) 



and \c^) are mutually orthogonal strings of R code basis 
letters, and \a^i,nk) i^ ^ junk string of length N outside 
the typical subspace. After encoding and decoding the 
message |a;^) that Alice originally has sent, will be a 
mixture 

a^N = Dt af« d\, + D^T o^^ D^j, (105) 

+ Y i(«'''i-'')n<™.)«™..i ■ (106) 

How ablout the confidence? The fidelity of \x^) in the 
mixture a^N reads 

F(a;^) = (x^KH^'')= Y l(«'^l^'')l' (107) 

I a") erf 

= l|nT|x^)r (108) 

Since any real number x satisfies x^ > 2a; — 1, we have 



F{x^) >2(a;^|nT|a;^)-l 
It follows for the confidence F of the code 

F = ^p(a:^)^(:r^) = ^p(x^)|inT|:i 

x^ x" 

> 2Tr{p®^nT} - 1 . 



JV\I|4 



(109) 

(110) 
(111) 



Using (|97|) we conclude that the confidence of the Schu- 
macher code is bounded from below by 



F > 1 - 2e 



(112) 



So Alice can achieve arbitrary good confidence by chosing 
the block size N large enough. 

The observable measuring the content of quantum in- 
formation in a Schumacher encoded message reads ac- 
cording to (p9|) 



Ic = log(dim7ic)ic 

= log(dim He) e\, Lc Et + Mt ^c E^t 

Since we have 



(113) 
(114) 
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the encoded information operator reads 
/c = i?log(dim7^c)nT 



(115) 
(116) 

(117) 



where the rate R fulfills (P9|). Although R must be an 
integer, we consider an ideal rate R fulfilling ( p9| ) with 
equality. Furthermore, for iV very large, the dimension 
on Vg' approaches dim T/j^ w 2^^^^^^ . Hence the encoded 
information reads approximately 

I,^NS{p)IlT , (118) 

whereas the information content of the source messages 
is measured by 



/ = log(dim'HQ) L 

where Iat is the unity operator on Tig. 
Neumann entropy fulfills ( p2| ) we have 



(119) 
(120) 

Since the von 
(121) 



i.e. the Schumacher code is compressive on the entire 
block space Hg, because it fulfills ( |7l| ) for any source 
message ensemble a. This is not surprising, since lossy 
codes throw away information, hence any source mes- 
sage ensemble can only either be compressed or keep 
its size. Canonical messages of length iV, contain- 
ing TV log(dim7ig) qbits of information are optimally 
compressed to NS{p) qbits of information. Thus here 
the quantum information per letter is compressed from 
log(dimHQ) to S{p) qbits. This is not necessarily valid 
for messages of other types. In the next section, we will 
extend the Schumacher code to messages of a more gen- 
eral form, namely to grand canonical messages, and ob- 
tain a similiar result. 



2. Generalized Schumacher coding 

Within the framework of many-letter theory the Schu- 
macher coding scheme can be generalized to grand canon- 
ical messages, i.e. messages a of the form 






A„p' 



®n 



(122) 



The typical subspaces T/j" are spanned by the typical ba- 
sis strings |a") of length n in the typical set TJ*. The rate 
r of the code components depends on n according to ( |99| ) 
for R i—f r and iV i— > n varying. The typical many-letter 
suhspace Vg is given by 



Vs:=^Vi 



(123) 



with l/j" given by (|9^) for N i^ n varying. The encoders 
that Alice uses, read now 



oo 



(124) 
(125) 



where again we set \a°) := |-), p®° := |-)(-| and let T^ 
contain only the empty message a'^ := (•). The junk mes- 
sage is now allowed to be the empty message |-). Bob's 
decoders look like 



Dr, 



4, 



D^i 



Z^ Z^ I 

m=0 \c'^)^Wt 



•)(c" 



(126) 



where Wt is the code image of the typical subspace, 
i.e. Wt ■= c{Vs) and |c™) are mutually orthogonal code 
strings of length to. Since every subspace Tig of mes- 
sages of length n is orthogonal to a subspace of mes- 
sages of different length, encoding and decoding of dif- 
ferent subspaces does not interfere. Though Schumacher 
coding will be only confidcntal and optimal within the 
higher dimensional subspaces. Considering ideal rates 
and a length distribution A„ which support lies mostly 
in higher dimensional subspaces, the information content 
will be compressed from 



/ = log(dimHg) L 

oo 

— log(dimHg) 2J n^n 



(127) 
(128) 



n=0 



to 



Ic = i?log(dim7i:c)nT" 


(129) 


^?^hog(dimy5")nT" 


(130) 


« ^nS{p)nT^ , 


(131) 



n=0 



where 



n„ = ^K)(a"| 



(132) 



is the projector onto the subspace Hq of length n mes- 
sages and 






(133) 



is the projector on the typical subspace of length n mes- 
sages. Since S{p) < log(dim7ig) and IIt" < n„ we have 



n=0 



/, < / 



(134) 
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i.e. the generalized Schumacher code is a compression 
code on the entire many-letter space. For grand canon- 
ical messages an optimal compression will be achieved. 
The raw information content of the source messages then 
reads 



I (a) = 2^ A„ n log(dim7ig) 



(135) 



The Schumacher code compresses the raw information 
content to 



/,(a)«^A„n5(p)PT" 



(136) 



n=0 



qbits, where Pt^ = Tr(p'*"nT") is the probability of a 
block message of length n lying in the typical subspace 
Vj" . If the support of the length distribution A„ is on sub- 
spaces of dimensions being high enough, the confidence 
of the code is still acceptable, i.e. 



Pt := TrjaHT} = Yl ^"^^" > ^ 

n=0 



(137) 



is achievable for any e,S > 0. The projector IIt onto the 
typical many-letter subspace Vs is defined by 



n. 



-En. 



(138) 



VI. LOSSLESS COMPRESSION 



A compression code always makes use of statistical 
properties of the sourc e message ensemble. As already 
stated in section HID, a compression code can be real- 
ized in two ways 



Type 1 (Lossy): Compress the most probable 
messages and forget the rest, or 

Type 2 (Lossless): Compress the most probable 
messages and enlarge the rest. 

Since the latter involve codewords of variable length, they 
can hardly be realized on block spaces. Nevertheless, an 
implementation of Huffman coding into quantum infor- 
mation theory based on block spaces has been worked 
out by Braunstein et al. (see 0), but due to the restric- 
tion to block spaces this coding scheme it is not a lossless 
scheme. In the framework of many-letter quantum infor- 
mation theory, however, lossless compression is realizable 
in the following way. 



A. Compressing grand canonical messages 



Note, however, that for a given source message ensem- 
ble the fidelity can only be increased by a higher toler- 
ance 5 of the typical subspaces, which results in a bad 
compression. Only if Alice choses a suitable length dis- 
tribution, she can achieve both optimal compression and 
reliable transmission. In the limit where the support of 
the length distribution A„ is shifted to n ^ oo we have 
Pt — >■ 1 and dimF^" — > nS{p), hence 



y^^XnnSjp) 



(139) 



n=0 



In this limit, each of the perfectly distinguishable canon- 
ical components p®" of a is compressed to n S{p) qbits. 
The total compressed message is the sum of the com- 
pressed components, weightened by A„. Hence also for 
grand canonical messages one can say that the Schu- 
macher code compresses each message to S{p) qbits per 
letter. This confirms the result already obtained in the 
last section. Note, however, that the notion of a com- 
pression per letter only makes sense in case of (grand) 
canonical messages. Other types of message cannot be 
Schumacher compressed, just because for them there is 
no letter matrix p. Hence in the context of Schumacher 
compression the von Neumann entropy has not yet a fun- 
damental meaning. In the next section we will introduce 
a lossless compression scheme applying to all messages, 
that finally establishes the von Neumann entropy as the 
amount of core quantum information of any given mes- 
sage ensemble. 



A symbol quantum code over the alphabet Q can be 
represented by a single-letter encoder 



t?Q:=El^(«))H ' 



(140) 



where Bq — {\a)}a is a basis letter set spanning the let- 
ter space TCq and |c(a)) is a string of code letters taken 
from an orthogonal code alphabet Be = {|c)}. Thus the 
length of the codeword is 



Lc Ha)) = L,{a) \c{a)) 



(141) 



The extension of the code c to strings of arbitrary length 
can be given by 

|c(a")):=|c(ai)...c(a„)) , (142) 

so the total length of the encoded message |a") reads 



Lc|c(a"))-ic(a")|c(a")) 



(143) 



where ic (a") := Lc(ai) + . . . + Lc(a„). The code must be 
uniquely decodeable, i.e. c{a'^) ^ c{a'"^) for a" ^ a'™, 
so the code messages must fulfill 



(c(a")|c(a'") =0 for a" 7^ a'^ 



(144) 
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The total encoder of all messages is constructed by 



C:=Y.^'T ' 



where 



C|" := Cq g 



'Cq , 



(145) 



(146) 
(147) 



i.e. the empty message stays empty and all other strings 
are encoded letter by letter. The encoder, which can also 
be written as 



^ = EEic(«"))<«"i ' 



(148) 



n— a"- 



is an isometric operator on the many-letter space A4q^ 
since 

oo 

C^C= Y. J2 |a")(c(a")|c(a'"))(a'"| (149) 

n,7n—0 a^ ,a''^ 



£^|a")(a"| = l 



(150) 



Alice now choses her messages from the grand canoni- 
cal message ensemble 



where the letter matrix p is given by 

P^Yp{x)\x){x\ 



(151) 



(152) 



Since the Huffman code is a binary code, the average 
length equals the average information content of the let- 
ter ensemble: 



h{p)^\og{diinnc)L{p) = L,{p) , 



(155) 



whereas the Shannon entropy of the basis letter ensem- 
ble equals the von Neumann entropy of the letter ma- 
trix, H{A) — S{p), thus we have Ic{p) = S{p). Since 
Sjp^"') = nS{p), the grand canonical message ensem- 



ble (151) contains 



Ic{<y) = ^ A„ 71 S{p) = S{a) 



(156) 



qbits of encoded information on the average. 
Since the original information content is /(fi) = 
J2^=o^nn\og{diniHQ) and since S{p) < log(dimHQ) 
this coding scheme is a compressive code according 
to(0). 



B. Compressing general messages 



In analogy to section [II G we can introduce a general 
coding scheme that optimally compresses an arbitrary 
message ensemble 






(157) 



over a given source alphabet Q — {\x)} without any loss 
of information. Let 



0- = ^g» \e^){e. 



(158) 



with the diagonalization 



'Yq{a)\a)(a\ 



(153) 



i.e. we have chosen the basis alphabet Bq such that p be- 
comes diagonal. To Bob it appears as if Alice would send 
him perfectly distinguishable messages |a") over the al- 
phabet Bq distributed by g(a") — q{ai) ■ ■ ■ q{an). Hence 
it is a good idea to invoke a Huffman coding scheme 
(see section [II F| ) mapping each letter \a) to a binary 
codeword |c(a)) of length Lc{a) — — logg(a). Again, the 
above length is in general not an integer and one has to 
take the integer next above instead. Though we regard 
the above number as an ideal length provided by an ideal 
Huffman code. Since it is an optimal code, the average 
length of the encoded letter ensemble is minimized to the 
Shannon entropy of the basis letter ensemble 



L,^ H{A) = -Y,q{a)\ogq{a) 



(154) 



be a diagonalization of a^ where the |ej)'s are eigenvec- 
tors of p to nonzero eigenvalues g^ > and generally 
no product messages but rather superpositions of strings 
|a") over some orthogonal basis alphabet Bq — {\a)}a'- 



^)-Y.Y.^^ 



(159) 



Now regard the set £ := {|ei)}i itself as an alphabet, 
whose letters are the vectors jci), distributed by qi. Then 
there is a Huffman code mapping each je^) to a unique bi- 
nary codeword \c{ei)) = |(ci ■ • ■ c;J(ei)), which is a string 
of length li = — log g^, taken from the binary basis alpha- 
bet Be = {|0), |1)}. Again, the above length is ideal. In 
real life the Huffman code choses a codeword with an in- 
teger length next above (— logg^). Every eigenvector \ei) 
of cr is mapped to a string |c(ej)) with (c(ej)|c(ej)) — 5ij, 
and 



Lc\c{ei)) = k \c[ei)) 



(160) 
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by the encoder 



Cr := ^ \c{e^)){ei 



(161) 



The encoder is a isometric operator on the source mes- 
sage space 



Mv = Span(r) 



(162) 



i.e. CpCr — Imt- Message components outside A^r are 
translated to the code space in the following way. Let 



Hr 



/^ 



\ei){ei\ 



(163) 



be the projector onto the subspace A^r and T : M.q ^t 
A^c be a translator from the sour ce alphabet Q to the 
code alphabet Qc (see section V_E), that fulfills 



(c(e,)|T|V') = \/iljeM^,\/i 



(164) 



with A(p being the subspace orthogonal to A^r- Hence 
the operator 



T^ 



l-Hr T l-Hr 



(165) 



translates only message components outside the subspace 
Air hito code messages being orthogonal to any of the 
\c{e^)), i.e. cIt^t = tI^Ct = 0. That way the total 
encoder 



C:=Cr 



T^i 



(166) 



is an isometric encoder from the source space M.q to the 
code space Mc- The encoded length is observed by 

L, = Ct Lc C = Cl Lc Ct + T^r ^c T^r (167) 

= ^ /, |e,)(e,| + i?(l - Hr) L (1 - Hr), (168) 



where R is the rate of the translation code T fulfilling 

i?> log(dimHQ) . (169) 

The encoded information is observed by 

R 



Ic^ Lc= 2_^li \ei){ei\ + 



log(dimHQ) 



I- 



where 



I-.r := (l-nr)/(l - Hr) 



(170) 



(171) 



observes the information content of components outside 
Air- Any a priori message \ip) €i T from Alice is en- 
coded into a superposition of Huffman strings of distinct 
leng ths. Alice's entire source message ensemble ct, given 
by (|l57| ) obtains an encoded length of Lc(ct) — "^^ qi k, 
so the encoded ensemble information reads 



Ic{cr) = y^ Qi k 



(172) 



For an ideal Huffman code providing li — — log qt the 
above value is minimized to 



^c(ct) ^ -^qi loggj == S{(t) 



(173) 



In the case of canonical messages ct = p®^ the encoded 
information reads 

/c(p^'^) = ^(p®^) = iV5(p) , (174) 

hence optimal compression is achieved in any case. 

C. Core quantum information content 



In analogy to section IIIH one may define an observ- 
able core information content respecting a source ensem- 
ble CT, given by 



CT = ^p{lp)\^){^\ =^qi\e^){ei 
(per i 



(175) 



For an ideal code the rate R of the translation part fulfills 
R — log(dim'Hg), whereas the lengths of the compres- 
sion part fulfill li — ~ log qi. Hence the core information 
content can be defined as 



/o := -logCT + /- 



(176) 



where /-,r, given by (171), measures the uncompressed 
information content outside A4r, and (— logCT) measures 
the compressed information content inside Air- The core 
information content of a general message p € S{AAq) is 
then defined by 



/o(p):=Tr{p/o} 



(177) 



The above value indicates the number of qbits being en- 
gaged on the average by communicating p over a lossless 
channel that is fully optimized respecting the ensemble 
CT. For example, the core information of each a priori 
message \ip) &V that Alice sends, is given by 

/o(^) = -(^llogCTl^) = -^logg,|(e,|<^)|2. (178) 



Any other message |-0) S Atg may also be sent without 
loss of information, but its compression is not optimized 
and might be poor, indicated by large values of /o(V')- 
The core information content of the source matrix itself 
equals its von Neumann entropy 

/o(ct) = ^p(^)/o(^) = -Tr{CT logCT} = S{a). (179) 

In this very sense the von Neumann entropy is the core 
quantum information contained in a message matrix ct. 
For any given ct Alice can design a lossless quantum code 
that minimizes the effort of communicating all a priori 



14 



message ensembles being equivalent to a. The core infor- 
mation is a quantum mechanical observable that yields 
the number of qbits that would be engaged if the message 
were communicated using a lossless compression code op- 
timized for (T. The average core information of any mes- 
sage ensemble equivalent to a equals its von Neumann 
entropy. This confirms the meaning that is commonly 
assigned to the von Neumann entropy and puts it on a 
solid ground. 



VII. SUMMARY 



dard block Hilbert spaces. Motivated by the concept of 
lossless compression, an observable is constructed mea- 
suring the core information content of a particular mes- 
sage with respect to a given a priori message ensemble. 
The expectation value of the a priori message ensemble 
itself equals its von Neumann entropy. Hence, in the con- 
text of lossless compression, the von Neumann entropy 
can be interpreted as the expected core quantum infor- 
mation content of a message ensemble that remains when 
any redundancy due to statistical predictability has been 
removed. This confirms the commonly assigned meaning 
of the von Neumann entropy. 



Within the framework of many-letter theory, a general 
characterization of quantum codes using the Kraus repre- 
sentation of completely positive maps has been given. An 
observable has been constructed measuring the raw quan- 
tum information content of a particular message, where 
the unit of its value has been given the name "1 qbit" . 
This type of quantum information content is merely re- 
lated to the effort it takes to communicate a particular 
quantum message. It is not based on statistical proper- 
ties of a message ensemble. Compression codes are de- 
fined by their property of reducing the quantum infor- 
mation content of a given message ensemble. A general 
form of translation codes has been given that translate 
between two alphabets without loss of information. It 
is shown that these codes, as expected, are never com- 
pressive. The formalism has then been applied to the 
Schumacher coding scheme, which is only defined on a 
special type of messages, so-called canonical messages, to 
see that the expected quantum information content per 
letter, represented by the introduced observable, can be 
reduced to the von-Neumann entropy, according to the 
known result. The Schumacher coding scheme has then 
been extended to a more general type of messages, so- 
called grand canonical messages. However, as the Schu- 
macher code can only be applied to messages of this type, 
the von Neumann entropy has not yet obtained its funda- 
mental meaning. This has been changed by constructing 
a lossless coding scheme for all messages providing op- 
timal compression and perfect retrieval of the original 
data. The given coding scheme exploits the features of 
many-letter spaces and cannot be implemented in stan- 



VIII. ACKNOWLEDGEMENTS 

I would like to thank Jens Eisert, Timo Felbinger, 
Alexander Albus, and Shash Virmani for fruitful and in- 
tensive discussions about the topic of this paper. 



[1] 

[2] 

[3] 



[6] 



K.J. Bostroem, Concepts of a q uantum informatio n theory 
of many letters, LANL eprint, |quant-ph/0009052| (2000) 
C. E. Shannon and W. Weaver, A mathematical Theory 
of communication The Bell System Technical Journal, 27, 
379-423,623-656, (1948). 

S.L. Braunstein, C.A. Fuchs, D. Gottesman, and H.- 
K. Lo, A quantum analog of hufFman coding. In IEEE 
International Symposium on Information T heory (1998). 
Iittp://xxx. lanI.gov/abs/quant-ph/9805080 
P. Hausladen, R. Jozsa, B. Schumacher, M. Westmore- 
land, and W.K. Wootters, Classical information capacity 
of a quantum channel. Phys. Rev. A, 54(3), 1869-1876 
(1996). 

D.J.C. MacKay 
Information theory, inference, and learning algorithm s, 



littp://wol. ra.phy.cam.ac.uk/mackay/itprnn/book. html, 

(1995-2000). 

J. Preskill. Lecture notes. 



Iittp: //www. theory.caltech.edu/people/preskiIl/ph219/ 



(1997-1999). 



15 



